-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix insert to Clickhouse TimestampWithTimeZone #23785
Fix insert to Clickhouse TimestampWithTimeZone #23785
Conversation
plugin/trino-clickhouse/src/main/java/io/trino/plugin/clickhouse/ClickHouseClient.java
Show resolved
Hide resolved
{ | ||
return (statement, index, value) -> { | ||
long millisUtc = unpackMillisUtc(value); | ||
TimeZoneKey timeZoneKey = unpackZoneKey(value); | ||
statement.setObject(index, Instant.ofEpochMilli(millisUtc).atZone(timeZoneKey.getZoneId())); | ||
statement.setObject(index, Instant.ofEpochMilli(millisUtc).atZone(columnTimeZone.toZoneId())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to perform this translation ? setObject
receives a ZonedDateTime
which is when handled by ClickhouseValues
(as a part of InputBasedPreparedStatement
-> ClickHousePreparedStatement
) would convert them to UTC and writes them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other options I tried didn't work for me. Do you something concrete in mind?
https://clickhouse.com/docs/en/sql-reference/data-types/datetime#examples
- When inserting datetime as an integer, it is treated as Unix Timestamp (UTC). 1546300800 represents '2019-01-01 00:00:00' UTC. However, as timestamp column has Asia/Istanbul (UTC+3) timezone specified, when outputting as string the value will be shown as '2019-01-01 03:00:00'
- When inserting string value as datetime, it is treated as being in column timezone. '2019-01-01 00:00:00' will be treated as being in Asia/Istanbul timezone and saved as 1546290000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant do we need this translation ? Currently Clickhouse's PreparedStatement implementation performs this translation i.e statement.setObject(index, Instant.ofEpochMilli(millisUtc).atZone(timeZoneKey.getZoneId())); - internally would operate on ZonedDataTime
- to fetch the UTC time and would update them as BigDecimal
- so I think this conversion to columnTimeZone
- We could set statement.setObject(index, Instant.ofEpochMilli(millisUtc))
which could save us some cpu cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it fail or ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tests fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stacktrace
io.trino.testing.QueryFailedException: Code: 62. DB::Exception: Syntax error: failed at position 98 (':'): :56Z. Expected one of: token, DoubleColon. (SYNTAX_ERROR) (version 23.8.12.13 (official build))
at io.trino.testing.AbstractTestingTrinoClient.execute(AbstractTestingTrinoClient.java:134)
at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:565)
at io.trino.testing.DistributedQueryRunner.execute(DistributedQueryRunner.java:548)
at io.trino.testing.datatype.SqlDataTypeTest.verifyPredicate(SqlDataTypeTest.java:129)
at io.trino.testing.datatype.SqlDataTypeTest.execute(SqlDataTypeTest.java:91)
at io.trino.plugin.clickhouse.BaseClickHouseTypeMapping.testClickHouseDateTimeWithTimeZone(BaseClickHouseTypeMapping.java:1014)
at io.trino.plugin.clickhouse.BaseClickHouseTypeMapping.testRepeated(BaseClickHouseTypeMapping.java:1202)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:212)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:194)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:212)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:212)
at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180)
at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:611)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:291)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1709)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:611)
at java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:507)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1489)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:2071)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2033)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:187)
Suppressed: java.lang.Exception: SQL: SELECT 'all found' FROM tpch.test_timestamp_with_time_zoned17ore1fpd WHERE col_0 IS NOT DISTINCT FROM TIMESTAMP '2024-01-01 05:19:56 +05:45'
at io.trino.testing.DistributedQueryRunner.executeInternal(DistributedQueryRunner.java:572)
... 35 more
Caused by: io.trino.spi.TrinoException: Code: 62. DB::Exception: Syntax error: failed at position 98 (':'): :56Z. Expected one of: token, DoubleColon. (SYNTAX_ERROR) (version 23.8.12.13 (official build))
at io.trino.plugin.jdbc.JdbcRecordCursor.handleSqlException(JdbcRecordCursor.java:312)
at io.trino.plugin.jdbc.JdbcRecordCursor.advanceNextPosition(JdbcRecordCursor.java:187)
at io.trino.$gen.CursorProcessor_20241016_080419_108.process(Unknown Source)
at io.trino.operator.ScanFilterAndProjectOperator$RecordCursorToPages.process(ScanFilterAndProjectOperator.java:329)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
at io.trino.operator.WorkProcessorUtils$YieldingProcess.process(WorkProcessorUtils.java:181)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
at io.trino.operator.WorkProcessorUtils$BlockingProcess.process(WorkProcessorUtils.java:207)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:317)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:359)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:240)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:255)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
at io.trino.operator.Driver.processInternal(Driver.java:403)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:201)
at io.trino.$gen.Trino_testversion____20241016_080414_36.run(Unknown Source)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:202)
at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:172)
at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:159)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.sql.SQLException: Code: 62. DB::Exception: Syntax error: failed at position 98 (':'): :56Z. Expected one of: token, DoubleColon. (SYNTAX_ERROR) (version 23.8.12.13 (official build))
at io.trino.plugin.jdbc.JdbcRecordCursor.advanceNextPosition(JdbcRecordCursor.java:167)
... 40 more
Caused by: java.util.concurrent.ExecutionException: java.sql.BatchUpdateException: Code: 62. DB::Exception: Syntax error: failed at position 98 (':'): :56Z. Expected one of: token, DoubleColon. (SYNTAX_ERROR) (version 23.8.12.13 (official build))
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:596)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:555)
at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91)
at io.trino.plugin.jdbc.JdbcRecordCursor.advanceNextPosition(JdbcRecordCursor.java:163)
... 40 more
Caused by: java.sql.BatchUpdateException: Code: 62. DB::Exception: Syntax error: failed at position 98 (':'): :56Z. Expected one of: token, DoubleColon. (SYNTAX_ERROR) (version 23.8.12.13 (official build))
at com.clickhouse.jdbc.SqlExceptionUtils.batchUpdateError(SqlExceptionUtils.java:107)
at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeAny(SqlBasedPreparedStatement.java:223)
at com.clickhouse.jdbc.internal.SqlBasedPreparedStatement.executeQuery(SqlBasedPreparedStatement.java:286)
at io.opentelemetry.instrumentation.jdbc.internal.OpenTelemetryStatement.wrapCall(OpenTelemetryStatement.java:304)
at io.opentelemetry.instrumentation.jdbc.internal.OpenTelemetryPreparedStatement.executeQuery(OpenTelemetryPreparedStatement.java:64)
at io.trino.plugin.jdbc.JdbcRecordCursor.lambda$advanceNextPosition$1(JdbcRecordCursor.java:158)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at com.google.common.util.concurrent.DirectExecutorService.execute(DirectExecutorService.java:51)
at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:145)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:79)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37)
at io.trino.plugin.jdbc.JdbcRecordCursor.advanceNextPosition(JdbcRecordCursor.java:156)
... 40 more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we could try using
When inserting datetime as an integer, it is treated as Unix Timestamp (UTC). 1546300800 represents '2019-01-01 00:00:00' UTC. However, as timestamp column has Asia/Istanbul (UTC+3) timezone specified, when outputting as string the value will be shown as '2019-01-01 03:00:00'
We have the millisUtc
we need to convert it to second and insert them. I think we could implement it as a follow up once we add support for timestamp with higher precision as well.
Can we add this
Clickhouse JDBC driver inserts datetime as string value as
yyyy-MM-dd HH:mm:ss
and zone from the Column metadata would be used.
As a code comment as it would be a bit confusing on why we need to convert it into a specific timezone as it would be same as Instant.ofEpochMilli(millisUtc).atZone(columnTimeZone.toZoneId())
@Praveen2112 all comments are addressed. Please approve. |
SqlDataTypeTest.create() | ||
.addRoundTrip("DateTime('Asia/Kathmandu')", "timestamp '2024-01-01 12:34:56'", TIMESTAMP_TZ_SECONDS, "TIMESTAMP '2024-01-01 05:19:56 +05:45'") | ||
.addRoundTrip("DateTime('Asia/Kathmandu')", "timestamp '2024-01-01 12:34:56 Asia/Kathmandu'", TIMESTAMP_TZ_SECONDS, "TIMESTAMP '2024-01-01 12:34:56 +05:45'") | ||
.addRoundTrip("DateTime('Asia/Kathmandu')", "timestamp '2024-01-01 12:34:56 +00:00'", TIMESTAMP_TZ_SECONDS, "TIMESTAMP '2024-01-01 18:19:56 +05:45'") | ||
.addRoundTrip("DateTime('Asia/Kathmandu')", "timestamp '2024-01-01 12:34:56 -01:00'", TIMESTAMP_TZ_SECONDS, "TIMESTAMP '2024-01-01 19:19:56 +05:45'") | ||
.execute(getQueryRunner(), session, clickhouseCreateAndTrinoInsert("tpch.test_timestamp_with_time_zone")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Praveen2112 without the change tests fail with:
java.lang.AssertionError: [Rows for query [SELECT * FROM tpch.test_timestamp_with_time_zonevvs0lcvaz4]]
Expecting actual:
(2024-01-01T12:34:56+05:45, 2024-01-01T12:34:56+05:45, 2024-01-01T12:34:56+05:45, 2024-01-01T12:34:56+05:45)
to contain exactly in any order:
[(2024-01-01T05:19:56+05:45, 2024-01-01T12:34:56+05:45, 2024-01-01T18:19:56+05:45, 2024-01-01T19:19:56+05:45)]
elements not found:
(2024-01-01T05:19:56+05:45, 2024-01-01T12:34:56+05:45, 2024-01-01T18:19:56+05:45, 2024-01-01T19:19:56+05:45)
and elements not expected:
(2024-01-01T12:34:56+05:45, 2024-01-01T12:34:56+05:45, 2024-01-01T12:34:56+05:45, 2024-01-01T12:34:56+05:45)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about trinoCreateAsSelect
, trinoCreateAndInsert
as it could affect for insert operation and CTAS operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clickhouse does not support Create table with TimestampWithTimeZone yet, thus only insert is tested to columns created by clickhouse.
Commit message updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can we update the commit message and PR description as Fix writing data into clickhouse for TImestampWithTimeZone
as it would be affected for both creates and inserts.
Plus additional test coverage for those operations.
SqlDataTypeTest.create() | ||
.addRoundTrip("DateTime('Asia/Kathmandu')", "timestamp '2024-01-01 12:34:56'", TIMESTAMP_TZ_SECONDS, "TIMESTAMP '2024-01-01 05:19:56 +05:45'") | ||
.addRoundTrip("DateTime('Asia/Kathmandu')", "timestamp '2024-01-01 12:34:56 Asia/Kathmandu'", TIMESTAMP_TZ_SECONDS, "TIMESTAMP '2024-01-01 12:34:56 +05:45'") | ||
.addRoundTrip("DateTime('Asia/Kathmandu')", "timestamp '2024-01-01 12:34:56 +00:00'", TIMESTAMP_TZ_SECONDS, "TIMESTAMP '2024-01-01 18:19:56 +05:45'") | ||
.addRoundTrip("DateTime('Asia/Kathmandu')", "timestamp '2024-01-01 12:34:56 -01:00'", TIMESTAMP_TZ_SECONDS, "TIMESTAMP '2024-01-01 19:19:56 +05:45'") | ||
.execute(getQueryRunner(), session, clickhouseCreateAndTrinoInsert("tpch.test_timestamp_with_time_zone")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about trinoCreateAsSelect
, trinoCreateAndInsert
as it could affect for insert operation and CTAS operation.
{ | ||
return (statement, index, value) -> { | ||
long millisUtc = unpackMillisUtc(value); | ||
TimeZoneKey timeZoneKey = unpackZoneKey(value); | ||
statement.setObject(index, Instant.ofEpochMilli(millisUtc).atZone(timeZoneKey.getZoneId())); | ||
statement.setObject(index, Instant.ofEpochMilli(millisUtc).atZone(columnTimeZone.toZoneId())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we could try using
When inserting datetime as an integer, it is treated as Unix Timestamp (UTC). 1546300800 represents '2019-01-01 00:00:00' UTC. However, as timestamp column has Asia/Istanbul (UTC+3) timezone specified, when outputting as string the value will be shown as '2019-01-01 03:00:00'
We have the millisUtc
we need to convert it to second and insert them. I think we could implement it as a follow up once we add support for timestamp with higher precision as well.
Can we add this
Clickhouse JDBC driver inserts datetime as string value as
yyyy-MM-dd HH:mm:ss
and zone from the Column metadata would be used.
As a code comment as it would be a bit confusing on why we need to convert it into a specific timezone as it would be same as Instant.ofEpochMilli(millisUtc).atZone(columnTimeZone.toZoneId())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % #23785 (comment)
The time zone is not stored in the rows of the table, but is stored in the column metadata. From ClickHouse documentation https://clickhouse.com/docs/en/sql-reference/data-types/datetime The point in time is saved as a Unix timestamp, regardless of the time zone or daylight saving time. The time zone affects how the values of the DateTime type values are displayed in text format and how the values specified as strings are parsed (‘2020-01-01 05:00:01’). Timezone agnostic Unix timestamp is stored in tables, and the timezone is used to transform it to text format or back during data import/export or to make calendar calculations on the values (example: toDate, toHour functions etc.). The time zone is not stored in the rows of the table (or in resultset), but is stored in the column metadata.
a111835
to
63b8e52
Compare
The time zone is not stored in the rows of the table, but is stored in the column metadata.
From ClickHouse documentation
https://clickhouse.com/docs/en/sql-reference/data-types/datetime
The point in time is saved as a Unix timestamp, regardless of the time zone or daylight saving time. The time zone affects how the values of the DateTime type values are displayed in text format and how the values specified as strings are parsed (‘2020-01-01 05:00:01’).
Timezone agnostic Unix timestamp is stored in tables, and the timezone is used to transform it to text format or back during data import/export or to make calendar calculations on the values (example: toDate, toHour functions etc.). The time zone is not stored in the rows of the table (or in resultset), but is stored in the column metadata.
Description
Additional context and related issues
PRs:
ClickHouse docs:
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: